AITopics | off-policy evaluation and policy optimization

Collaborating Authors

off-policy evaluation and policy optimization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Minimax Value Interval for Off-Policy Evaluation and Policy Optimization

Neural Information Processing SystemsDec-23-2025, 20:02:40 GMT

minimax value interval, name change, off-policy evaluation and policy optimization, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.64)

Add feedback

Review for NeurIPS paper: Minimax Value Interval for Off-Policy Evaluation and Policy Optimization

Neural Information Processing SystemsFeb-11-2025, 21:19:29 GMT

Weaknesses: The study of bias issue is important, but I am not fully convinced the motivation of this so-called "confidence interval". Normally the confidence interval is designed for uncertain quantification and thus of great practical interest. However, although the authors explicitly point out they do not consider uncertainties, this will rule out all the important applications that typical CI could do (safe RL or else) (this CI will not be valid in practice due to estimation error). Thus, I can only view the contribution in this paper as sort of additional guarantee for the algorithm proposed in "Minimax Weight and Q-Function Learning for Off-Policy Evaluation" since the algorithms are the same. Solely quantifying a bias of an existing estimator may not be viewed as sufficiently significant.

minimax value interval, neurips paper, off-policy evaluation and policy optimization, (3 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.41)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.64)

Add feedback

Review for NeurIPS paper: Minimax Value Interval for Off-Policy Evaluation and Policy Optimization

Neural Information Processing SystemsFeb-11-2025, 21:19:22 GMT

The paper provides a very general minimax framework for quantifying the bias/approximation error in off-policy evaluation, and the results apply to a range of OPE methods. Reviewers generally agree that this is a good paper and there is contribution. One potentially improvable direction would be to quantify the statistical noise in off-policy evaluation, which is nontrivial but extremely important. Reviewers, AC and SAC also agree that such analysis could be left for future work. We would also like to strongly suggest that the authors consider rephrase/explain the wording "confidence interval".

approximation error, minimax value interval, off-policy evaluation and policy optimization, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.67)

Add feedback

Minimax Value Interval for Off-Policy Evaluation and Policy Optimization

Neural Information Processing SystemsJan-22-2025, 08:13:23 GMT

We study minimax methods for off-policy evaluation (OPE) using value functions and marginalized importance weights. Despite that they hold promises of overcoming the exponential variance in traditional importance sampling, several key problems remain: (1) They require function approximation and are generally biased. For the sake of trustworthy OPE, is there anyway to quantify the biases? In this paper we answer both questions positively. By slightly altering the derivation of previous methods (one from each style), we unify them into a single value interval that comes with a special type of double robustness: when either the value-function or the importance-weight class is well specified, the interval is valid and its length quantifies the misspecification of the other class.

minimax value interval, off-policy evaluation and policy optimization, quantify, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.65)

Add feedback